Identifying Temporal Expression and its Syntactic Role Using FST and Lexical Data from Corpus

نویسندگان

Juntae Yoon

Yoonkwan Kim

Mansuk Song

چکیده

Accurate analysis of the temporal expression is crucial for Korean text processing applications such as information extraction and clmnking for efficient syntactic analysis. It is a complicated problem since temporal expressions often have the ambiguity of syntactic roles. This t)al)er discusses two problenm: (1) representing and identiflying the temporal expression (2) distinguishing the syntactic tim(lion of the temporal exI)ression in case it has a dual syntactic role. In this paper, temporal expressions and the context for disambiguation which is called local context are represented using lexical data extracted fiom corlms and the finite state transducer. By experiments, it; turns out that the method is eflimtive for temporal expression analysis. In particular, our al)t)roach shows the corI)us-based work could make a promising result for the t)roblem in a restricted domain in t, hat we can eflbctievely deal with a, large size of lexical data. 1 I n t r o d u c t i o n Accurate analysis of the temporal expression is crucial tbr text processing aplflications such as information extraction and for chunking for efficient syntactic analysis. In information extraction, a user might want to get a piece of information about an event. Typically, the event is related with (late or time,, which is represented by temporal expression. Chunking is helpflfl for efficient syntactic analysis by removing irrelevant intermediate constituents generated through parsing. It involves the task to divide sentences into non-overlatli)ing segments. As a result of chunking, parsing would be a problem of analysis inside chunks and between chunks (Yoon, et al., 1999). Chunking prevents the parser fl'om producing intermediate structures irrelevant to a final output, which makes the parser etticient without losing accuracy. Thus, it turns out that chunking is an essential stage tbr the application system like MT that should pursue both efficiency and precision. Korean, an agglutinative language, has welldeveloped flmctional words such as postposition and ending by which the grammatical fimction of a phrase is decisively determined. Besides, because it is a head final language and so the head always follows its complement, the chunking is relatively easy. However, we are also faced with an mnbiguity problem in chunking, which is often due to the temporal expression. This is because inany temporal nouns are used as the modifier of noun a n d vert) in a sentence. Let us consider the tbllowing examI)les: [Example] l a j inan(last) :l]('oFd'll,'llt(SlllillIler)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

A semantic tagger for the Finnish language

This paper reports on the current status and evaluation of a Finnish semantic tagger (hereafter FST), which was developed in the EU-funded Benedict Project. In this project, we have ported the Lancaster English semantic tagger (USAS) to the Finnish language. We have re-used the existing software architecture of USAS, and applied the same semantic field taxonomy developed for English to Finnish....

متن کامل

Textuality of Idiomatic Expressions in Cameroon English

The meaning of an idiomatic expression cannot be transparently worked out from the meanings of its constituent words due to its figurative and unpredictable nature. Consequently, the syntactic composition and the structural paradigm of an idiomatic expression are supposed to be the same in every context. However, this is not the case in the institutionalized second language varieties of English...

متن کامل

Politeness Strategies and Politeness Markers in Email Request Sent by Iranian EFL Learners to Professors

This study attempts to investigate politeness strategies and politeness markers in email-request sent byIranian male and female EFL learners to professors. The comparison between strategies used by malesand females in email-request were also analyzed. 52 actual emails of M.A students of TEFL studying atAzad University consisted the data in this research. To analyze the corpus, politeness strate...

متن کامل

Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations

The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Identifying Temporal Expression and its Syntactic Role Using FST and Lexical Data from Corpus

نویسندگان

چکیده

منابع مشابه

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

A semantic tagger for the Finnish language

Textuality of Idiomatic Expressions in Cameroon English

Politeness Strategies and Politeness Markers in Email Request Sent by Iranian EFL Learners to Professors

Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations

عنوان ژورنال:

اشتراک گذاری